Flow-Based Unconstrained Lip to Speech Generation
نویسندگان
چکیده
Unconstrained lip-to-speech aims to generate corresponding speeches based on silent facial videos with no restriction head pose or vocabulary. It is desirable intelligible and natural speech a fast speed in unconstrained settings. Currently, handle the more complicated scenarios, most existing methods adopt autoregressive architecture, which optimized MSE loss. Although these have achieved promising performance, they are prone bring issues including high inference latency mel-spectrogram over-smoothness. To tackle problems, we propose novel flow-based non-autoregressive model (GlowLTS) break constraints achieve faster inference. Concretely, decoder by maximizing likelihood of training data capable generation. Moreover, devise condition module improve intelligibility generated speech. We demonstrate superiority our proposed method through objective subjective evaluation Lip2Wav-Chemistry-Lectures Lip2Wav-Chess-Analysis datasets. Our demo video can be found at https://glowlts.github.io/.
منابع مشابه
Evaluation of a formant-based speech-driven lip motion generation
The background of the present work is the development of a tele-presence robot system where the lip motion of a remote humanoid robot is automatically controlled from the operator’s voice. In the present paper, we introduce an improved version of our proposed speech-driven lip motion generation method, where lip height and width degrees are estimated based on vowel formant information. The meth...
متن کاملA Fluid Flow Approach to Speech Generation
A fluid dynamic formulation of speech generation may lead to an improved understanding of the physics of speech production. Unlike more traditional linear acoustic methods of speech synthesis, this alternate approach aims to capture more of the relevant physics by numerically solving a form of the Reynolds-Averaged Navier-Stokes equations describing fluid motion. Though computationally intensiv...
متن کاملFace and Lip Localization in Unconstrained Imagery
When combined with acoustical speech information, visual speech information (lip movement) significantly improves Automatic Speech Recognition (ASR) in acoustically noisy environments. Previous research has demonstrated that visual modality is a viable tool for identifying speech. However, the visual information has yet to become utilized in mainstream ASR systems due to the difficulty in accur...
متن کاملSubjective Evaluation for HMM-Based Speech-To-Lip Movement Synthesis
An audio-visual intelligibility score is generally used as an evaluation measure in visual speech synthesis. Especially an intelligibility score of talking heads represents accuracy of facial models[1][2]. The facial models has two stages such as construction of real faces and realization of dynamical human-like motions. We focus on lip movement synthesis from input acoustic speech to realize d...
متن کاملSpeech-driven lip motion generation with a trajectory HMM
Automatic speech animation remains a challenging problem that can be described as finding the optimal sequence of animation parameter configurations given some speech. In this paper we present a novel technique to automatically synthesise lip motion trajectories from a speech signal. The developed system predicts lip motion units from the speech signal and generates animation trajectories autom...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i1.19966